A brief introduction to the R/qtl package, with a walk-through of an analysis.
In order to use the R/qtl package, you must type (within R)
library(qtl)
. You may wish to include this in a
.Rprofile
file.
Documention and several tutorials are available at the R archive (https://cran.r-project.org).
Use the help.start
function to start the
html version of the R help.
Type library(help=qtl)
to get a list of the functions
in R/qtl.
Use the example
function to run examples
of the various functions in R/qtl.
A tutorial on the use of R/qtl is distributed with the package and is also available at https://rqtl.org/rqtltour.pdf.
Download the latest version of R/qtl from the R archive or from https://rqtl.org.
Here we briefly describe the use of R/qtl to analyze an experimental cross. A more extensive tutorial on its use is distributed with the package and is also available at https://rqtl.org/rqtltour.pdf.
A difficult first step in the use of most data analysis software is the
import of data. With R/qtl, one may import data in several different
formats by use of the function read.cross
. The
internal data structure used by R/qtl is rather complicated, and is
described in the help file for read.cross
. We won't
discuss data import any further here, except to say that the
comma-delimited format ("csv"
) is recommended. If you have
trouble importing data, send an email to Karl Broman,
broman@wisc.edu, perhaps attaching examples of your data
files. (Such data will be kept confidential.) Also see the sample data
files and code at https://rqtl.org/sampledata/.
We consider the example data hyper
, an experiment on
hypertension in the mouse, kindly provided
by Bev Paigen and Gary Churchill. Use the data
function to load the data.
data(hyper)
The hyper
data set has class "cross"
. The
function summary.cross
gives summary information
on the data, and checks the data for internal consistency. A number
of other utility functions are available; hopefully these are
self-explanatory.
summary(hyper)
nind(hyper)
nphe(hyper)
nchr(hyper)
nmar(hyper)
totmar(hyper)
The function plot.cross
gives a graphical summary of
the data; it calls plotMissing
(to plot a matrix
displaying missing genotypes) and plotMap
(to plot
the genetic maps), and also displays histograms or barplots of the
phenotypes. The plotMissing
function can plot
individuals ordered by their phenotypes; you can see that for most
markers, only individuals with extreme phenotypes were genotyped.
plot(hyper)
plotMissing(hyper)
plotMissing(hyper, reorder=TRUE)
plotMap(hyper)
Note that one marker (on chromosome 14) has no genotype data. The
function drop.nullmarkers
removes such markers from
the data.
hyper <- drop.nullmarkers(hyper)
totmar(hyper)
The function est.rf
estimates the recombination
fraction between each pair of markers, and calculates a LOD score for
the test of \(r\) = 1/2. This is useful for identifying markers that
are placed on the wrong chromosome. Note that since, for these data,
many markers were typed only on recombinant individuals, the pairwise
recombination fractions show rather odd patterns.
hyper <- est.rf(hyper)
plotRF(hyper)
plotRF(hyper, chr=c(1,4))
To re-estimate the genetic map for an experimental cross, use the
function est.map
. The function
plotMap
, in addition to plotting a single map, can
plot the comparison of two genetic maps (as long as they are composed of
the same numbers of chromosomes and markers per chromosome). The
function replace.map
map be used to replace the
genetic map in a cross with a new one.
newmap <- est.map(hyper, error.prob=0.01, verbose=TRUE)
plotMap(hyper, newmap)
hyper <- replace.map(hyper, newmap)
The function calc.errorlod
may be used to assist in
identifying possible genotyping errors; it calculates the error LOD
scores described by Lincoln and Lander (1992). The
calc.errorlod
function return a modified version of
the input cross, with error LOD scores included. The function
top.errorlod
prints the genotypes with values above a
cutoff (by default, the cutoff is 4.0).
hyper <- calc.errorlod(hyper, error.prob=0.01)
top.errorlod(hyper)
The function plotGeno
may be used to inspect the
observed genotypes for a chromosome, with likely genotyping errors
flagged.
plotGeno(hyper, chr=16, ind=c(24:34, 71:81))
Before doing QTL analyses, some intermediate calculations need to be
performed. The function calc.genoprob
calculates
conditional genotype probabilities given the multipoint marker data.
sim.geno
simulates sequences of genotypes from their
joint distribution, given the observed marker data.
As with calc.errorlod
, these functions return a
modified version of the input cross, with the intermediate calculations
included. The step
argument indicates the density of the grid on
which the calculations will be performed, and determines the density at
which LOD scores will be calculated.
hyper <- calc.genoprob(hyper, step=2.5, error.prob=0.01)
hyper <- sim.geno(hyper, step=2.5, n.draws=64, error.prob=0.01)
The function scanone
performs a genome scan with a
single QTL model. By default, it performs standard interval mapping
(Lander and Botstein 1989): use of a normal model and the EM algorithm.
If one specifies method="hk"
, Haley-Knott regression is performed
(Haley and Knott 1992). These two methods require the results from
calc.genoprob
.
out.em <- scanone(hyper)
out.hk <- scanone(hyper, method="hk")
If one specifies method="imp"
, a genome scan is performed by the
multiple imputation method of Sen and Churchill (2001). This method
requires the results from sim.geno
.
out.imp <- scanone(hyper, method="imp")
The output of scanone
is a data.frame with class
"scanone"
. The function plot.scanone
may be
used to plot the results, and may plot up to three sets of results
against each other, as long as they conform appropriately.
plot(out.em)
plot(out.hk, col="blue", add=TRUE)
plot(out.imp, col="red", add=TRUE)
plot(out.hk, out.imp, out.em, chr=c(1,4), lty=1,
col=c("blue","red","black"))
The function summary.scanone
may be used to list
information on the peak LOD for each chromosome for which the LOD
exceeds a specified threshold.
summary(out.em)
summary(out.em, threshold=3)
summary(out.hk, threshold=3)
summary(out.imp, threshold=3)
The function max.scanone
returns the maximum LOD
score, genome-wide.
max(out.em)
max(out.hk)
max(out.imp)
One may also use scanone
to perform a permutation
test to get a genome-wide LOD significance threshold.
operm.hk <- scanone(hyper, method="hk", n.perm=1000)
The result has class "scanoneperm"
. The
summary.scanoneperm
function may be used to calculate
LOD thresholds.
summary(operm.hk, alpha=0.05)
The permutation results may also be used in the
summary.scanone
function to calculate LOD thresholds
and genome-scan-adjusted p-values.
summary(out.hk, perms=operm.hk, alpha=0.05, pvalues=TRUE)
We should say at this point that the function
save.image
will save your workspace to disk. You'll
wish you had used this if R crashes.
The function scantwo
performs a two-dimensional
genome scan with a two-QTL model. Methods "em"
, "hk"
and
"imp"
are all available. scantwo
is
considerably slower than scanone
, and can require a
great deal of memory. Thus, you may wish to re-run
calc.genoprob
and/or sim.geno
with
a more coarse grid.
hyper <- calc.genoprob(hyper, step=10, err=0.01)
hyper <- sim.geno(hyper, step=10, n.draws=64, err=0.01)
out2.hk <- scantwo(hyper, method="hk")
out2.em <- scantwo(hyper)
out2.imp <- scantwo(hyper, method="imp")
The output is an object with class scantwo
. The function
plot.scantwo
may be used to plot the results. The
upper triangle contains LOD scores for tests of epistasis, while the
lower triangle contains LOD scores for the full model.
plot(out2.hk)
plot(out2.em)
plot(out2.imp)
The function summary.scantwo
lists the interesting
aspects of the output. For each pair of chromosomes \((k,l)\), it
calculates the maximum LOD score for the full model, \(M_f(k,l)\); a
LOD score indicating evidence for a second QTL, allowing for epistasis),
\(M_{fv1}(k,l)\); a LOD score indicating evidence for
epistasis, \(M_i(k,l)\); the LOD score for the additive QTL model,
\(M_a(k,l)\); and a LOD score indicating evidence for a second QTL,
assuming no epistasis, \(M_{av1}(k,l)\).
You must provide five LOD thresholds, corresponding to the above five LOD scores, and in that order. A chromosome pair is printed if either (a) \(M_f(k,l) \ge T_f\) and (\(M_{fv1}(k,l) \ge T_{fv1}\) or \(M_i(k,l) \ge T_i\)), or (b) \(M_a(k,l) \ge T_a\) and \(M_{av1}(k,l) \ge T_{av1}\).
summary(out2.em, thresholds=c(6.2, 5.0, 4.6, 4.5, 2.3))
summary(out2.em, thresholds=c(6.2, 5.0, Inf, 4.5, 2.3))
In the latter case, the interaction LOD score will be ignored.
The function max.scantwo
returns the maximum joint
and additive LODs for a two-dimensional genome scan.
max(out2.em)
Permutation tests may also performed with scantwo
;
it may take a few days of CPU time. The output is a list containing the
maxima of the above five LOD scores for each of the imputations.
operm2 <- scantwo(hyper, method="hk", n.perm=100)
summary(operm2, alpha=0.05)
To cite R/qtl in publications, use the Broman et al. (2003) reference listed below.
Karl W Broman, broman@wisc.edu
Broman, K. W. and Sen, Ś. (2009) A guide to QTL mapping with R/qtl. Springer. https://rqtl.org/book/
Broman, K. W., Wu, H., Sen, Ś. and Churchill, G. A. (2003) R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889--890.
Haley, C. S. and Knott, S. A. (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69, 315--324.
Lander, E. S. and Botstein, D. (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185--199.
Lincoln, S. E. and Lander, E. S. (1992) Systematic detection of errors in genetic linkage data. Genomics 14, 604--610.
Sen, Ś. and Churchill, G. A. (2001) A statistical framework for quantitative trait mapping. Genetics 159, 371--387.